home *** CD-ROM | disk | FTP | other *** search
- Path: soap.news.pipex.net!pipex!usenet
- From: John Nurick <j.nurick@dial.pipex.com>
- Newsgroups: comp.lang.pascal.delphi.misc,comp.lang.c,comp.lang.pascal.misc,comp.lang.c++
- Subject: Re: WORD FILE FORMAT (WINDOWS)
- Date: 13 Mar 1996 07:49:27 GMT
- Organization: UnipalmPIPEX server (post doesn't reflect views of UnipalmPIPEX)
- Message-ID: <4i5um7$7vi@soap.news.pipex.net>
- References: <4hv2ho$d8t@news.interpath.net> <4i2c5e$t70@kiwi.futuris.net> <4i4rb1$snv@gate.stateoftheart.com>
- NNTP-Posting-Host: an051.du.pipex.com
- Mime-Version: 1.0
- Content-Type: text/plain; charset=us-ascii
- Content-Transfer-Encoding: 7bit
- X-Mailer: Mozilla 1.1N (Windows; I; 16bit)
-
- Mike Girou (girou@parashift.com) wrote:
- >Once [Microsoft] publish the internal [Winword file format]
- >standards, then they are pretty well stuck with that standard for
- >life. As long as they, and only they, build code around the internal
- >representations, they are free to change those representations as the
- >need arises (hopefully providing conversion tools).
-
- Just like OOP!
-
-
- gt2497b@acmez.gatech.edu (Joe Novosel) wrote
-
- >Their own programs are not even compatable
- >with themselves. I use Word v6 and v2. I have to use v2 on my notebook
- >because 6.0 is so big. I have to be very careful file formats because 2.0
- >won't read 6.0's files.
-
- There's a Microsoft filter for Word 2 that reads Word 6 files;
- can't remember the name but I think you can ftp it from microsoft.com.
- Still doesn't support the nice new formatting features in WW6 of
- course ...
-
-
- WinWord 6 file format:
-
- WinWord 6+ and some or all of the rest of MS Office use OLE Structured
- Storage for their files. This means each file has an internal tree
- structure of _storages_ and _streams_, analogous to directories & files.
- It gives a standard interface for storage of OLE objects, and for other
- apps to extract information (e.g. summary info). It is also why WW6 files
- are so much bigger than earlier Word versions'.
-
- There's a Delphi-centred intro to OLE SS by John Lam in PC Magazine
- (US) 19 December 95, though it doesn't cover WW6 as such. Another
- contributor to this thread mentioned discussion of WW6 files in PC
- Mag; like him/her I recollect it but can't find it.
-
- Word tradition from way back has been to store the text as straight
- text with no formatting instructions or tags. All the formatting
- is in separate tables or lists at the end. This made it pretty easy
- to extract plain text from a not-too-complicated Word doc: the text
- began immediately after the file header, and the header contained a
- pointer to the first byte of the formatting tables.
-
- The same basic approach should work with WW6, although you'd have
- to delve into OLE SS to get the offsets of the start and end of text.
-
- With complicated texts (footnotes, annotations, symbols, tables,
- fields), however, things rapidly get a lot nastier.
-
- Footnotes have always been indicated by a single non-printing char
- in the text (same char for every footnote); the program had
- to keep track of these and maintain pointers to the text of each
- note. From WW6, the same technique is used for symbols (e.g. a single
- char from WingDings or Symbol font) (which is why searching for a
- symbol in WW6 doesn't work properly!).
-
- IMHO the least worst approach to the problem this thread started
- with would be to use WordBasic to export the relevant files as text
- or .RTF and distribute them in that form. No big challenge to write
- a WordBasic program to look out for changes (SaveDate) in the docs
- and update the text or RTF versions (or you can do it with Delphi
- and the WordBasic API).
-
-
- --
- Best wishes
-
- John Nurick
-
- e-mail: j.nurick@dial.pipex.com
- v-mail: <+44|0> 191 281 1306
-
-
-